network lasso
Decentralised Traffic Incident Detection via Network Lasso
Zhu, Qiyuan, Qin, A. K., Abeysekara, Prabath, Dia, Hussein, Grzybowska, Hanna
Traffic incident detection plays a key role in intelligent transportation systems, which has gained great attention in transport engineering. In the past, traditional machine learning (ML) based detection methods achieved good performance under a centralised computing paradigm, where all data are transmitted to a central server for building ML models therein. Nowadays, deep neural networks based federated learning (FL) has become a mainstream detection approach to enable the model training in a decentralised manner while warranting local data governance. Such neural networks-centred techniques, however, have overshadowed the utility of well-established ML-based detection methods. In this work, we aim to explore the potential of potent conventional ML-based detection models in modern traffic scenarios featured by distributed data. We leverage an elegant but less explored distributed optimisation framework named Network Lasso, with guaranteed global convergence for convex problem formulations, integrate the potent convex ML model with it, and compare it with centralised learning, local learning, and federated learning methods atop a well-known traffic incident detection dataset. Experimental results show that the proposed network lasso-based approach provides a promising alternative to the FL-based approach in data-decentralised traffic scenarios, with a strong convergence guarantee while rekindling the significance of conventional ML-based detection methods.
A Bayesian approach to multi-task learning with network lasso
Shimamura, Kaito, Kawano, Shuichi
Network lasso is a method for solving a multi-task learning problem through the regularized maximum likelihood method. A characteristic of network lasso is setting a different model for each sample. The relationships among the models are represented by relational coefficients. A crucial issue in network lasso is to provide appropriate values for these relational coefficients. In this paper, we propose a Bayesian approach to solve multi-task learning problems by network lasso. This approach allows us to objectively determine the relational coefficients by Bayesian estimation. The effectiveness of the proposed method is shown in a simulation study and a real data analysis.
Fast local linear regression with anchor regularization
Petrovich, Mathis, Yamada, Makoto
The regression problem is an important problem in machine learning, data mining, and statistics, and several research works have investigated it in the past decades. Examples include stock price prediction [12, 25], age prediction from RNA-seq [6] or images [11], sentimental analysis [15, 18], or house prediction [7] to name a few. The most widely used regression approach is based on a linear model including the ordinary least squares (OLS), Ridge regression, least absolute shrinkage and selection operator (Lasso) [19], and elastic net [26]. Because these linear models are extremely simple and can be interpreted by simply checking the linear coefficients of the variables; these approaches are in particular used in practice. However, one of the limitations of linear models is that they cannot handle complex nonlinear data; the performance can be significantly degraded if we apply these linear methods to process complex data such as the gene expression data used heavily in biology and healthcare. To handle complex data, researchers tend to use kernel methods such as kernel ridge regression (KRR) and support vector regression (SVR) [16].
On the Duality between Network Flows and Network Lasso
The data arising in many application domains have an intrinsic network structure. Such network structure is computationally apprealing due to the availability of highly scalable graph algorithms. An important class of graph algorithms is related to optimizing network flows. This paper explores the duality of network flow methods and the recently proposed network Lasso. Network Lasso extends the Lasso method from sparse linear models to clustered graph signals. It turns out that the computational and statistical properties of network Lasso crucially depends on the existence of sufficiently large network flows. Using elementary tools from convex analysis, we offer a precise characterization of the duality between network Lasso and a minimum cost network flow problem. This duality provides a strong link between network Lasso methods and network flow algorithms.
Analysis of Network Lasso For Semi-Supervised Regression
We characterize the statistical properties of network Lasso for semi-supervised regression problems involving network- structured data. This characterization is based on the con- nectivity properties of the empirical graph which encodes the similarities between individual data points. Loosely speaking, network Lasso is accurate if the available label informa- tion is well connected with the boundaries between clusters of the network-structure datasets. We make this property precise using the notion of network flows. In particular, the existence of a sufficiently large network flow over the empirical graph implies a network compatibility condition which, in turn, en- sures accuracy of network Lasso.
Triangle Lasso for Simultaneous Clustering and Optimization in Graph Datasets
Zhao, Yawei, Xu, Kai, Liu, Xinwang, Zhu, En, Zhu, Xinzhong, Yin, Jianping
Recently, network lasso has drawn many attentions due to its remarkable performance on simultaneous clustering and optimization. However, it usually suffers from the imperfect data (noise, missing values etc), and yields sub-optimal solutions. The reason is that it finds the similar instances according to their features directly, which is usually impacted by the imperfect data, and thus returns sub-optimal results. In this paper, we propose triangle lasso to avoid its disadvantage. Triangle lasso finds the similar instances according to their neighbours. If two instances have many common neighbours, they tend to become similar. Although some instances are profiled by the imperfect data, it is still able to find the similar counterparts. Furthermore, we develop an efficient algorithm based on Alternating Direction Method of Multipliers (ADMM) to obtain a moderately accurate solution. In addition, we present a dual method to obtain the accurate solution with the low additional time consumption. We demonstrate through extensive numerical experiments that triangle lasso is robust to the imperfect data. It usually yields a better performance than the state-of-the-art method when performing data analysis tasks in practical scenarios.
Recovery Conditions and Sampling Strategies for Network Lasso
Mara, Alexandru, Jung, Alexander
The network Lasso is a recently proposed convex optimization method for machine learning from massive network structured datasets, i.e., big data over networks. It is a variant of the well-known least absolute shrinkage and selection operator (Lasso), which is underlying many methods in learning and signal processing involving sparse models. Highly scalable implementations of the network Lasso can be obtained by state-of-the art proximal methods, e.g., the alternating direction method of multipliers (ADMM). By generalizing the concept of the compatibility condition put forward by van de Geer and Buehlmann as a powerful tool for the analysis of plain Lasso, we derive a sufficient condition, i.e., the network compatibility condition, on the underlying network topology such that network Lasso accurately learns a clustered underlying graph signal. This network compatibility condition relates the location of the sampled nodes with the clustering structure of the network. In particular, the NCC informs the choice of which nodes to sample, or in machine learning terms, which data points provide most information if labeled.
Localized Lasso for High-Dimensional Regression
Yamada, Makoto, Takeuchi, Koh, Iwata, Tomoharu, Shawe-Taylor, John, Kaski, Samuel
We introduce the localized Lasso, which is suited for learning models that are both interpretable and have a high predictive power in problems with high dimensionality $d$ and small sample size $n$. More specifically, we consider a function defined by local sparse models, one at each data point. We introduce sample-wise network regularization to borrow strength across the models, and sample-wise exclusive group sparsity (a.k.a., $\ell_{1,2}$ norm) to introduce diversity into the choice of feature sets in the local models. The local models are interpretable in terms of similarity of their sparsity patterns. The cost function is convex, and thus has a globally optimal solution. Moreover, we propose a simple yet efficient iterative least-squares based optimization procedure for the localized Lasso, which does not need a tuning parameter, and is guaranteed to converge to a globally optimal solution. The solution is empirically shown to outperform alternatives for both simulated and genomic personalized medicine data.
Robust and scalable Bayesian analysis of spatial neural tuning function data
Rad, Kamiar Rahnama, Machado, Timothy A., Paninski, Liam
A common analytical problem in neuroscience is the interpretation of neural activity with respect to sensory input or behavioral output. This is typically achieved by regressing measured neural activity against known stimuli or behavioral variables to produce a "tuning function" for each neuron. Unfortunately, because this approach handles neurons individually, it cannot take advantage of simultaneous measurements from spatially adjacent neurons that often have similar tuning properties. On the other hand, sharing information between adjacent neurons can errantly degrade estimates of tuning functions across space if there are sharp discontinuities in tuning between nearby neurons. In this paper, we develop a computationally efficient block Gibbs sampler that effectively pools information between neurons to de-noise tuning function estimates while simultaneously preserving sharp discontinuities that might exist in the organization of tuning across space. This method is fully Bayesian and its computational cost per iteration scales sub-quadratically with total parameter dimensionality. We demonstrate the robustness and scalability of this approach by applying it to both real and synthetic datasets. In particular, an application to data from the spinal cord illustrates that the proposed methods can dramatically decrease the experimental time required to accurately estimate tuning functions.